22 research outputs found

    From Imitation to Prediction, Data Compression vs Recurrent Neural Networks for Natural Language Processing

    Get PDF
    In recent studies [1][13][12] Recurrent Neural Networks were used for generative processes and their surprising performance can be explained by their ability to create good predictions. In addition, data compression is also based on predictions. What the problem comes down to is whether a data compressor could be used to perform as well as recurrent neural networks in natural language processing tasks. If this is possible,then the problem comes down to determining if a compression algorithm is even more intelligent than a neural network in specific tasks related to human language. In our journey we discovered what we think is the fundamental difference between a Data Compression Algorithm and a Recurrent Neural Network

    Hash2Vec: Feature Hashing for Word Embeddings

    Get PDF
    In this paper we propose the application of feature hashing to create word embeddings for natural language processing. Feature hashing has been used successfully to create document vectors in related tasks like document classification. In this work we show that feature hashing can be applied to obtain word embeddings in linear time with the size of the data. The results show that this algorithm, that does not need training, is able to capture the semantic meaning of words.We compare the results against GloVe showing that they are similar. As far as we know this is the first application of feature hashing to the word embeddings problem and the results indicate this is a scalable technique with practical results for NLP applications.Sociedad Argentina de Informática e Investigación Operativa (SADIO

    Generic LSH Families for the Angular Distance Based on Johnson-Lindenstrauss Projections and Feature Hashing LSH

    Get PDF
    In this paper we propose the creation of generic LSH families for the angular distance based on Johnson-Lindenstrauss projections. We show that feature hashing is a valid J-L projection and propose two new LSH families based on feature hashing. These new LSH families are tested on both synthetic and real datasets with very good results and a considerable performance improvement over other LSH families. While the theoretical analysis is done for the angular distance, these families can also be used in practice for the euclidean distance with excellent results [2]. Our tests using real datasets show that the proposed LSH functions work well for the euclidean distance.Sociedad Argentina de Informática e Investigación Operativa (SADIO

    Luis Argerich al Gobernador de la Provincia de Buenos Aires

    Get PDF
    Informa que de acuerdo a lo ordenado se han cargado en las carretas de Don Domingo Cruz los artículos de guerra librados al servicio del Ejército Auxiliar de los Andes. Hay una rúbrica de Juan Manuel de RosasCopi

    Hash2Vec: Feature Hashing for Word Embeddings

    Get PDF
    In this paper we propose the application of feature hashing to create word embeddings for natural language processing. Feature hashing has been used successfully to create document vectors in related tasks like document classification. In this work we show that feature hashing can be applied to obtain word embeddings in linear time with the size of the data. The results show that this algorithm, that does not need training, is able to capture the semantic meaning of words.We compare the results against GloVe showing that they are similar. As far as we know this is the first application of feature hashing to the word embeddings problem and the results indicate this is a scalable technique with practical results for NLP applications.Sociedad Argentina de Informática e Investigación Operativa (SADIO

    Dispatcher3 – Machine learning for efficient flight planning: approach and challenges for data-driven prototypes in air transport

    Get PDF
    Machine learning techniques to support decisionmaking processes are in trend. These are particularly relevant in the context of flight management where large datasets of planned and realised operations are available. Current operations experience discrepancies between planned and executed flight plan, these might be due to external factors (e.g. weather, congestion) and might lead to sub-optimal decisions (e.g. recovering delay (burning extra fuel) when no holding is expected at arrival and therefore it was no needed). Dispatcher3 produces a set of machine learning models to support flight crew pre-departure, with estimations on expected holding at arrival, runway in use and fuel usage, and the airline’s duty manager on pre-tactical actions, with models trained with a larger look ahead time for ATFM and reactionary delay estimations. This paper describes the prototype architecture and approach of Dispatcher3 with particular focus on the challenges faced by this type of data-driven machine learning models in the field of air transport ranging: from technical aspects such as data leakage to operational requirements such as the consideration and estimation of uncertainty. These considerations should be relevant for projects which try to use machine learning in the field of aviation in general.This work is performed as part of Dispatcher3 innovation action which has received funding from the Clean Sky 2 Joint Undertaking (JU) under grant agreements No 886461. The Topic Manager is Thales AVS France SAS. The JU receives support from the European Union’s Horizon 2020 research and innovation programme and the Clean Sky 2 JU members other than the Union. The opinions expressed herein reflect the authors’ views only. Under no circumstances shall the Clean Sky 2 Joint Undertaking be responsible for any use that may be made of the information contained herein.Postprint (published version
    corecore